The Latency Hiding Effectiveness of Decoupled Access/Execute Processors
نویسندگان
چکیده
Several studies have demonstrated that out-of-order execution processors may not be the most adequate organization for wide issue processors due to the increasing penalties that wire delays will cause in the issue logic. The main target of out-of-order execution is to hide functional unit latencies and memory latency. However, the former can be quite effectively handled at compile time and this observation is one of the main arguments for the emerging EPIC architectures. In this paper, we demonstrate that a decoupled access/execute organization is very effective to hide memory latency, even when it is very high. Therefore, an in-order decoupled access/execute organization with an EPIC architecture is a promising alternative for future wide issue processors. This paper presents a thorough evaluation of such processor organization. First, a generic decoupled access/execute architecture is defined and evaluated. Then the benefits of a lockup-free cache, control speculation and a store-load bypass mechanism under such architecture are evaluated. Our analysis indicates that memory latency can be almost completely hidden by such techniques.
منابع مشابه
Improving Latency Tolerance of Multithreading through Decoupling
ÐThe increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this problem. This work presents and evaluates a novel processor microarchitecture which combines two paradi...
متن کاملMultithreaded Decoupled Access/Execute Processors
This work presents and evaluates a novel processor microarchitecture which combines two paradigms: access/execute decoupling and simultaneous multithreading. We investigate how both techniques complement each other in the design of high performance next generation ILP processors. While decoupling features an excellent memory latency hiding efficiency, simultaneous multithreading supplies the in...
متن کاملTowards Power Efficiency on Task-Based, Decoupled Access-Execute Models
This work demonstrates the potential of hardware and software optimization to improve the effectiveness of dynamic voltage and frequency scaling (DVFS). For software, we decouple data prefetch (access) and computation (execute) to enable optimal DVFS selection for each phase. For hardware, we use measurements from state-of-the-art multicore processors to accurately model the potential of per-co...
متن کاملSimplifying Hardware for Out Of Order Execution using the Decoupling Paradigm
Future hardware and software technology will try to provide improved performance by extracting higher levels of parallelism. However the cost of a main memory access-in terms of missed instruction issue slots-increases with faster processors and greater issue widths. For this reason latency hiding technology remains one of the most important parts of high performance processor designs. In this ...
متن کاملComputer Systems Group Design Issues for Latency Hiding on an Access Decoupled
Future software and hardware technologies will try to provide improved performance by extracting higher levels of parallelism. However the cost of a main memory access-in terms of missed instruction slots-increases with faster processors and greater issue widths. For this reason latency hiding technology remains one of the most important parts of high performance processor designs. In this pape...
متن کامل